Analysis of Autonomous Systems
AA120Q: Building Trust in Autonomy, Stanford University.
Lecture 8
We will discuss a variety of methods for analyzing the behavior of autonomous systems.
Assignment:
Run simulations to characterize collision avoidance performance against a variety of metrics; develop methods for visualizing the decision making behavior of your system.
Measuring System Effectiveness
It is the designer's responsibility to go back to the field and assess the impact that the autonomous system is having. This measurement process must be both qualitative and quantitative.
Qualitative: Deals with the quality of a result. Does the policy followed by the agent look good? Is it behaving reasonably?
Quantitative: Objective values that can be quantified. For example, with ACAS X, one should look at operational data on airborne collisions, near-misses, and separation after ACAS X has been put into place.
Reward
Autonomous agents are often trained to maximize their reward. Does your agent receive high reward? That's all you care about, right?
Wrong.
We care about the performance of the system in the real world, and the real world is never perfectly modeled.
The Pareto Frontier
When optimizing a real-world system one often must balance a large number of trade-offs.
Which of the following is better?
an airborne collision avoidance system that has
collision and alerts per million flight hoursan airborne collision avoidance system that has
collisions and alerts per million flight hours
xxxxxxxxxxusing Plotsx
scatter([1,2], [1000,10], color=:black, label=nothing, xlabel="NMACs per million flight hours", ylabel="Alerts per million flight hours")Fewer collisions are good, and fewer alerts are good, but we cannot say which collision system is better without making a judgement on their relative value.
We know, therefore, that if we have a set of policies:
x
scatter([0.5,0.75,1,1.5,2,3,1.8,0.8,2.5,1.2,0.7,1.4], [1e5,3e4,1e4,1e3,10,2,2e4,5e4,1e4,1.2e4,9e4,5e4], color=:black, label=nothing, xlabel="NMACs per million flight hours", ylabel="Alerts per million flight hours")The policies that are potentially the best are those which cannot be shifted to be made better in both respects:
x
begin plot([0.5,0.75,1,1.5,2,3], [1e5,3e4,1e4,1e3,10,2], color=:red, markerstrokecolor=:red, marker="*", label=nothing) scatter!([1.8,0.8,2.5,1.2,0.7,1.4], [2e4,5e4,1e4,1.2e4,9e4,5e4], color=:gray, markerstrokecolor=:gray, label=nothing, xlabel="NMACs per million flight hours", ylabel="Alerts per million flight hours") annotate!([(1.3, 7e4, text("Approx. Pareto Frontier", 14, :red)), (1.8, 3e4, text("Suboptimal", 14, :gray)), (0.7, 5e3, text("Infeasible", 14, :black))])endThe Pareto Frontier is obtained by adjusting the tradeoff between your multiple objectives and optimizing models to trace out the curve.
The region closer to the origin than the Pareto Frontier is infeasible, whereas the region farther from the origin than the Pareto Frontier is suboptimal.
Given a Pareto Frontier, how do we choose the best policy?
This is often a subjective question, and often requires the careful consideration of factors that are not in your optimization objective. Domain experts are often consulted.
Inspect the Decision Making Behavior
Has your agent really learned to do what it was designed to do?
If you have trained a neural network to recognize cats, how do you know whether the neural network has really learned what a cat is?
Below we see the result of optimizing a neural network trained to recognize dumbbells. It turns out that the net sees dumbbells as dumbbells with forearms.
xxxxxxxxxxPlutoUI.LocalResource("./figures/dumbbell.png")The Black Swan Problem
This problem is known as the Black Swan Problem. The problem gets its name from the black swans of Australia and New Zealand, and the incorrect induction followed by a European:
All swans I have seen are white, therefore all swans are white
Of course, once said European goes to southern Australia and sees a black swan they can either change their belief or forever categorize the black swan as an entirely different species.
With autonomous agents we want to make sure that they identify the correct categories. It is often a non-trivial problem.
For an Autonomous Car, are these Pedestrians?
Sure looks like a pedestrian.